A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor

نویسندگان

چکیده

Abstract We address the efficient realization of matrix multiplication ( gemm ), with application in convolution operator for machine learning, RISC-V core present GreenWaves GAP8 processor. Our approach leverages BLIS (Basic Linear Algebra Instantiation Software) to develop an implementation that (1) re-organizes algorithm adapting its micro-kernel exploit hardware-supported dot product kernel GAP8; (2) explicitly orchestrates data transfers across hierarchy scratchpad memories via DMA (direct memory access); and (3) operates integer arithmetic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hardware accelerated approach for floating-point multiplication on 32-bit pipelined RISC-V processor

Implementing hardware support for all extensions of the RISC-V Instruction Set Architecture inside a processor would lead to avoidable area and power consumption for applications that rarely utilize a particular extension. In this paper, authors have first suggested a modified 3-stage pipeline alternative to the ZSCALE processor (32-bit) by UC Berkeley. Subsequently a hardware-accelerated appro...

متن کامل

Vector ISA Extension for Sparse Matrix-Vector Multiplication

In this paper we introduce a vector ISA extension to facilitate sparse matrix manipulation on vector processors (VPs). First we introduce a new Block Based Compressed Storage (BBCS) format for sparse matrix representation and a Block-wise Sparse Matrix-Vector Multiplication approach. Additionally, we propose two vector instructions, Multiple Inner Product and Accumulate (MIPA) and LoaD Section ...

متن کامل

Optimizing Matrix-matrix Multiplication for an Embedded Vliw Processor

The optimization of matrix-matrix multiplication (MMM) performance has been well studied on conventional general-purpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. However, the microarchitectur...

متن کامل

A fuzzy RISC processor

In this paper, we describe application-specific extensions for fuzzy processing to a general purpose processor. The application-specific instruction set extensions were defined and evaluated using hardware/software codesign techniques. Based on this approach, we have extended the MIPS instruction set architecture with only a few new instructions to significantly speed up fuzzy computation with ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Journal of Supercomputing

سال: 2022

ISSN: ['0920-8542', '1573-0484']

DOI: https://doi.org/10.1007/s11227-022-04581-6